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Abstract 


Metrics  are  often  used  to  compare  the  performance  of  newly  developed  systems  with 
the  performance  of  their  predecessors.  Metrics  can  also  be  used  to  compare  the 
output  of  a  simulator  with  real-world  data  to  test  the  accuracy  of  the  simulation. 
Statistical  comparison  of  these  metrics  can  be  necessary  when  making  such  a 
determination.  There  are  different  methods  of  statistical  comparison  that  are 
sensitive  to  the  various  types  of  underlying  distribution  of  the  metric  data. 
Distribution  type  can  affect  the  performance  of  these  tests,  and,  fortunately,  the 
distributions  of  many  common  metrics  are  well  known.  For  example,  mean  time  to 
repair  (MTTR)  and  mean  flight  hours  between  critical  failures  (MFHBCF),  generally 
follow  a  log-normal  and  an  exponential  distribution,  respectively.  This  paper 
presents  the  effects  of  distribution  type  and  parameters  on  the  statistical  power  of 
two  common  goodness-of-fit  tests  (Kolmogorov-Smirnov  and  Anderson-Darling)  via 
Monte  Carlo  simulation. 
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Introduction 


The  results  of  a  Monte  Carlo  simulation  that  calculates  the  statistical  power  of  two 
common  goodness-of-fit  (GoF)  tests  are  presented  and  analyzed  in  this  paper. 
Various  distribution  types  are  considered,  including  the  normal,  lognormal,  and 
exponential  distributions.  The  results  of  this  study  provide  required  sample  size  as  a 
function  of  statistical  power.  The  presented  data  can  be  used  to  determine  the 
minimum  required  sample  size  for  a  desired  level  of  power.  The  simulation 
methodology  can  be  adapted  to  calculate  statistical  power  for  the  same  distributions 
with  different  parameters  or  other  distribution  types. 

G  ood  nessr  of-  Ft  Testing 

Goodness-of-fit  (GoF)  testing  is  a  technique  used  to  determine  how  well  a  statistical 
model  fits  a  data  set.  Single-sample  GoF  tests  consider  a  null  and  an  alternative 
hypothesis  to  confirm  whether  a  sample  could  have  been  drawn  from  a  population 
with  a  particular  distribution.  Multi-sample  GoF  tests  determine  whether  the  samples 
could  have  been  drawn  from  populations  with  the  same  distribution.  Thus,  GoF  tests 
are  useful  for  validating  whether  simulation  output  is  similar  to  real-world  data,  and 
for  comparing  the  performance  of  a  new  system  to  that  of  a  previous  generation. 
Two  such  tests,  Kolmogorov-Smirnov  (KS)  and  Anderson-Darling  (AD),  are  the 
subjects  of  discussion  in  this  paper,  and  their  behaviors  in  terms  of  statistical  power 
are  analyzed  and  presented.  Determining  statistical  power  is  important  for  test 
design  because  it  enables  the  designer  to  choose  a  minimum  sample  size  required  to 
detect  a  difference  between  samples  (i.e.,  the  GoF  result  may  be  too  unreliable  if  the 
required  sample  size  is  not  used  for  the  test). 

Two- Sample  KSand  AD  Tests 

The  two-sample  KS  and  AD  tests  are  GoF  tests  used  to  infer  whether  two  samples 
were  drawn  from  populations  with  the  same  distribution.  In  both  tests,  the  empirical 
distribution  function  (EDF)  of  each  sample  is  used  to  calculate  the  test  statistic.  The 
EDF  is  a  step  function  that  steps  by  1/n  for  each  occurrence  of  n,  as  shown  in  Figure 
1  for  the  case  of  two  normally  distributed  samples.  If  the  value  of  the  test  statistic  is 
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larger  than  a  critical  value  for  a  given  significance,  or  if  the  p-value  is  less  than  the 
given  level  of  significance,  the  null  hypothesis  is  rejected  and  one  can  infer  that  the 
samples  were  drawn  from  populations  with  dissimilar  distributions.  Both  tests  can 
accommodate  equal  or  unequal  sample  sizes  among  the  two  samples  being 
considered.  The  test  statistics  for  the  KS  and  AD  tests  are  shown  below,  respectively, 
in  Equations  1  and  2  [1]. 


Figure  1.  Empirical  distribution  functions  of  two  randomly  collected  and  normally 
distributed  samples  with  p  and  a  ofl 


KS  =  max\Fni(x)  -  Gnz(x)\ 

where  Fn  (x)  and  C„2(x)  are  EDFs  of  the  two  samples.  The  equations  used  to 

determine  the  KS  critical  values  for  varying  levels  of  significance  are  shown  in 
Appendix  B  as  a  function  of  c(a),  n^  and  n2. 


AD 


rhU2  r°°  foVqU)  ~  Cn2(*)}2 

N  J_ro/Ux){l-tfw(x)} 


dHN(x ) 


(2) 


(1) 


where  (x)  and  Gn2  (x)  are  EDFs  of  the  two  samples  with  sample  sizes  and  n2 , 
nt  +  n2  =  N  and  hn(x)  =  (riiF  (x)  +  n2Gn^x)}/N. 
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Equation  2  can  be  generalized  in  discrete  form,  as  shown  in  Equation  3  [2]: 


AD  = 


N-  1 
N2 


-Yh  |  1yjl  (^n2f-n2//j)2 

Ulik  1  HiiN-Hj-N^  U2lk  1  HiiN-Hj-N^ 


(3) 


where  z.  is  the  array  with  length  L  of  the  distinct  values  of  the  two  samples  ordered 
from  smallest  to  largest,  N  is  the  total  number  of  data  points  of  the  two  samples 
(N^+iy,),  h.  is  the  number  of  values  in  the  combined  samples  equal  to  z.,  H.  is  the 
number  of  values  in  the  combined  samples  less  than  z.  plus  one  half  the  number  of 
values  in  the  combined  samples  equal  to  z.,  and  Fnl.  and  Fn2.  are  the  number  of  values 
in  group  ni  or  n2  that  are  less  than  z.  plus  one  half  the  number  of  values  in  the 
specific  group  equal  to  z.. 

The  method  to  determine  the  p-value  of  the  two-sample  AD  test  statistic  is  shown  in 
Appendix  B.  It  was  adapted  from  reference  [2].  Alternatively,  a  critical  value  can  be 
calculated  for  a  direct  comparison  to  the  test  statistic  when  performing  a  hypothesis 
test. 

It  should  be  noted  that  the  KS  test  is  less  complex  than  the  AD  test,  both  on  an 
intuitive  and  a  computational  level.  The  KS  test  statistic  simply  looks  for  the 
maximum  distance  between  EDFs  for  the  two  samples  along  their  entire  range,  and  is 
more  sensitive  to  discrepancies  between  EDFs  toward  the  median,  while  the  AD  test 
statistic  integrates  over  their  entirety  and  includes  a  weighting  term  [H{x)  *  (1  - 
//(x))]-1  that  places  greater  emphasis  on  the  tails  of  the  EDFs. 


Understanding  KSand  AD  Statistical  Power 
via  Monte  Cario  Simulation 

Statistical  power  is  the  probability  of  correctly  rejecting  the  null  hypothesis  when  the 
alternative  hypothesis  is  true.  It  is  dependent  on  sample  size  and  also  the  difference 
in  parameters  (means  and  variance)  between  the  samples  being  compared.  Because  of 
this,  experiment  designers  must  choose  a  minimal  necessary  sample  size  to  maintain 
a  minimally  acceptable  level  of  power.  However,  since  power  is  sensitive  to 
differences  in  sample  distribution  parameters,  assigning  an  accurate  estimation  of 
power  to  a  statistical  test  is  often  nontrivial  and  could  require  a-priori  knowledge  of 
sample  distribution  and  parameters.  Without  this  knowledge,  or  best  estimation,  one 
cannot  assign  a  meaningful  level  of  power  to  a  GoF  test.  The  statistical  power  of  KS 
and  AD  tests  can  be  analyzed  for  a  given  range  of  parameters  and  sample  sizes  to 
provide  insight  into  relative  adequacy  of  the  tests. 
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Below,  the  powers  of  the  KS  and  AD  tests  are  calculated  via  Monte  Carlo  simulation 
for  varying  sample  sizes  and  distribution  parameters.  Normal,  lognormal,  and 
exponential  distributions  are  considered.  For  normal  and  lognormal  distributions,  a 
simulation  parameter  defined  as  A  p/a  is  used  to  observe  the  effect  of  distribution 
parameters  on  test  power.  This  parameter  is  the  difference  in  means  between  the  two 
samples  divided  by  the  sample  standard  deviation  (the  standard  deviation  is 
assumed  to  be  constants  for  all  cases  in  the  simulation).  It  is  often  referred  to  as 
the  “signal-to-noise  ratio”  when  determining  statistical  power  for  normal  and 
lognormal  data.  For  power  estimation  of  the  exponential  distribution,  parameter  Ap 
is  considered,  where  Ap  is  the  difference  in  means  between  the  samples  and 
p=po-Ap.  For  brevity,  the  simulation  parameters  Ap/a  for  normal  and  lognormal 
and  Ap  for  exponential  are  both  referred  to  generally  as  5  in  the  schematic  in  Figure 
3. 

Since  the  shape  of  the  exponential  probability  density  function  (PDF)  relies  on  the 
mean  parameter,  the  relative  difference  in  means  between  two  exponential 
distributions  cannot  be  used  alone  to  sufficiently  determine  power.  In  other  words, 
for  example,  one  cannot  expect  similar  power  when  considering  one  set  of 
exponential  samples  with  means  of  0.2  and  1.2  and  another  set  with  means  of  4  and 
5,  even  though  the  difference  between  both  is  1.  This  observation  is  displayed  in 
Figure  2,  where  the  relative  shapes  of  two  sets  of  exponential  PDFs  vary  drastically, 
despite  the  same  difference  between  means. 


Figure  2.  Exponential  PDFs  with  meansof  p  =0.2  and  1.2  and  p=4and  5 


4 


CNA 


Simulation  Methodology 

The  method  used  in  this  paper  to  simulate  statistical  power  as  a  function  of  sample 
size  and  6  is  shown  in  Figure  3.  The  simulation  starts  by  considering  a  specific 
distribution  type.  Then,  two  samples  with  chosen  sample  sizes  nt  and  n2  and 
parameter  6  are  randomly  drawn.  The  GoF  test  is  applied  at  a  significance  of  0.2  and 
the  result  of  the  hypothesis  test  is  stored.  This  sequence  is  iterated  for  a  total  of 
10,000  times.  Then,  power  is  calculated  by  dividing  the  number  of  times  the  test 
rejected  the  null  hypothesis  by  the  total  number  of  iterations.  For  example,  if  the  null 
hypothesis  is  rejected  9,500  times  out  of  10,000  total  iterations,  the  calculated  power 
is  95  percent.  This  scheme  is  repeated  for  varying  sample  sizes  from  4  through  150 
with  an  increment  of  2,  and  6  from  0  through  1  with  an  increment  of  0.1.  The  Matlab 
script  used  to  perform  the  simulation  is  in  Appendix  C.  VBA  code  capable  of  running 
the  AD  and  KS  tests  is  in  Appendix  D. 

All  AD  test  simulations  use  an  identical  sample  size  for  each  iteration  (n^nj, 
whereas  the  KS  simulation  uses  a  sample  size  offset  by  1  (ni=n2+l,  N=  n^n,).  Using 
this  offset  accounts  for  how  the  critical  value  of  the  two-sample  KS  test  does  not 
increase  monotonically  for  increasing  sample  size  and  n1=n2,  especially  for  small  N 
[3].  This  behavior  does  not  significantly  affect  the  results  of  this  study,  considering 
that  acceptable  levels  of  power  (80  percent  or  greater)  are  generally  achieved  with 
N>50.  However,  tables  in  references  [3]  and  [4]  should  be  consulted  when  performing 
the  KS  test  for  N«50  to  achieve  acceptable  accuracy. 


Figure  3.  Monte  Carlo  simulation  flowchartforestimating  AD  and  KSstatistical 
power 


6  =  0  thru  1 
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Simulation  Results  and  Discussion 

The  AD  test  is  generally  known  to  be  more  sensitive  than  KS,  as  shown  in  Figure  4, 
due  to  its  greater  emphasis  on  the  tails  of  the  data  [1],  and  the  results  of  the 
simulation  in  this  study  reaffirm  this  for  all  distributions  considered.  Statistical 
power  is  displayed  in  Figure  5  through  Figure  8  as  a  function  of  6  and  sample  size. 
The  x  and  y  axes  represent  6  and  sample  size,  respectively,  and  the  color  contour  in 
each  plot  displays  the  corresponding  level  of  power  calculated  in  the  simulation  for  a 
given  6  and  sample  size.  The  legend  next  to  each  plot  correlates  the  numeric  value  of 
power  to  the  color  displayed.  Numeric  values  of  power  are  located  in  Table  1  through 
Table  6  in  Appendix  A  for  all  distributions  except  the  exponential  distribution  with 
po=5  because  its  statistical  power  is  below  0.80  for  all  values  of  sample  size  and  6. 

Figure  4.  Simulated  statistical  powerforAD  and  KS  tests  with  normal  distribution 
(M=5,  ct=1  )  and  5  =4.5 
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Figure  5.  Simulated  statistical  powerfornormal  distribution  using  AD  (left)  and  KS 
(right)  tests 


Figure  6.  Simulated  statistical  powerforlognormal  distribution  using  AD  (left)  and  KS 
(right)  tests 


S 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

S 


0.9 


0.7 

0.6 


0.4 

0.3 


0.1 


7 


CNA 


Figure  7.  "The  CNA  figure  quick  part 

Simulated  statistical  powerforexponential  distribution  with  po=l  using  AD  (left)  and  KS 
(right)  tests 


Figure  8.  Simulated  statistical  powerforexponential  distribution  with  |io=5  using  AD 
(left)  and  KS  (right)  tests 
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Summary 

The  results  from  this  study  affirm  that  distribution  type  and  parameters  control  the 
statistical  power  of  the  AD  and  KS  tests.  Larger  sample  sizes  will  generally  increase 
power  for  normal,  lognormal,  and  exponential  distributions.  The  statistical  power  of 
exponentially  distributed  data  depends  on  both  the  difference  in  means  between 
samples  and  the  values  of  the  means  when  using  GoF  testing.  Depending  on 
exponential  parameter  po,  the  AD  and  KS  tests  may  not  be  able  to  achieve  desirable 
levels  of  power  regardless  of  sample  size. 
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Appendix  A:  Tabulated  Data  from  AD 
and  KS  Simulations 


Table  1.  Required  sample  size  for  given  power  and  6  obtained  from  AD  simulation 
with  normal  distribution 


6=0.2 

P 

n 

80 

- 

90 

- 

95 

- 

99 

- 

6=0.5 

P 

n 

80 

43 

90 

64 

95 

78 

99 

110 

6=0.7 

P 

n 

80 

22 

90 

32 

95 

40 

99 

54 

6=0.9 

P 

n 

80 

14 

90 

20 

95 

23 

99 

39 

Table  2.  Required  sample  size  for  given  power  and  6  obtained  from  AD  simulation 
with  lognormal  distribution 


6=0.5 

P 

n 

80 

41 

90 

58 

95 

76 

99 

104 

6=0.9 

P 

n 

80 

14 

90 

19 

95 

26 

99 

36 

6=0.7 

P 

n 

80 

23 

90 

30 

95 

41 

99 

56 

6=0.2 

P 

n 

80 

- 

90 

- 

95 

- 

99 

- 

Table  3.  Required  sample  size  for  given  power  and  6  obtained  from  AD  simulation 
with  exponential  distribution  and  [io=0 


6=0.2 

P 

n 

80 

- 

90 

- 

95 

- 

99 

- 

6=0.5 

P 

n 

80 

27 

90 

36 

95 

47 

99 

65 

6=0.7 

P 

n 

80 

9 

90 

13 

95 

18 

99 

26 

6=0.9 

P 

n 

80 

- 

90 

5 

95 

6 

99 

9 
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Table  4.  Required  sample  size  for  given  power  and  6  obtained  from  K$  simulation 
with  normal  distribution 


6=0.2 

P 

n 

80 

- 

90 

- 

95 

- 

99 

- 

6=0.9 

P 

n 

80 

17 

90 

25 

95 

30 

99 

44 

6=0.7 

P 

n 

80 

28 

90 

38 

95 

49 

99 

67 

6=0.5 

P 

n 

80 

52 

90 

71 

95 

87 

99 

138 

Table  5.  Required  sample  size  for  given  power  and  6  obtained  from  KS  simulation 
with  lognormal  distribution 


6=0.2 

P 

n 

80 

- 

90 

- 

95 

- 

99 

- 

6=0.9 

P 

n 

80 

16 

90 

23 

95 

30 

99 

45 

6=0.7 

P 

n 

80 

27 

90 

36 

95 

46 

99 

68 

6=0.5 

P 

n 

80 

53 

90 

73 

95 

92 

99 

128 

Table  6.  Required  sample  size  for  given  power  and  6  obtained  from  KS  simulation 
with  exponential  distribution  and  [io=0 


6=0.2 

P 

n 

80 

- 

90 

- 

95 

- 

99 

- 

6=0.9 

P 

n 

80 

- 

90 

7 

95 

7 

99 

10 

6=0.7 

P 

n 

80 

11 

90 

17 

95 

21 

99 

29 

6=0.5 

P 

n 

80 

31 

90 

46 

95 

59 

99 

77 
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Appendix  B:  Calculating  p- Values  for 
the  ADlfestand  Critical  Values  for 
the  KSTest 


The  method  detailed  below  to  calculate  the  p-value  from  the  k-sample  AD  test  is 
drawn  from  reference  [2].  For  the  case  of  two  samples  (k=2),  the  method  begins  by 
calculating 


AD-  l 
T  = - 

°n 

where 


(4) 


with 


on  =  yJvar(AD) 


aN 3  +  bN2  +  cN  +  d 
(N-  l)(iV-2)(iV-3) 


(5) 


a  =  (4g  —  6){k  -  1)  +  (10  -  6 g)H  (6) 

b  =  (2g  —  4)/c2  +  8 hk  +  (2 g  -  Uh  -4)H-8h  +  4g-6  (7) 

c  —  (6/i  +  2g  —  2 )k2  +  (4/i  -  4,g  +  6)k  +  (2/i  -  6)H  +  4/i  (8) 

d  =  (2h  +  6  )k2  —  \hk  (9) 


where 


and 


H  = 


i= l 


(10) 


9  = 


N—2  N- 1 

XZmhy 

i= 1  j=i+ 1 


(ii) 
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where  N=ni+n2. 

Once  the  value  of  T  is  obtained,  the  value  of  \n(p)  can  be  interpolated  from  Table  7. 
The  log-transformed  values  of  p  must  be  used  since  they  increase  linearly  with  T. 
This  result  is  then  transformed  to  obtain  p. 

Table  7.  Pen:  entiles  and  Log-Transformed  Pe  rc  entiles  of  the  T  distribution  [2] 

T  0.326  0.626  1.225  1.96  2.719  3.752 

P  0.25  0.2  0.1  0.05  0.025  0.01 

ln(P)  -1.386  -1.609  -2.303  -2.996  -3.689  -4.605 

The  critical  value  for  the  two-sample  KS  test  is 


(12) 


with  values  of  c(a)  shown  in  Table  8. 


Table  8.  KS  critic  a  I  value  para  meters  for  various  levels  of  significance 


c(0.2) 

c(0.1) 

c(0.05) 

c(0.01) 


1.07 

1.22 

1.36 

1.63 
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Appendix  C:  Matlab  Simulation 
Code 


clear  all 
close  all 
clc 

format  long  g 


numsamp=2 ; 
variance=l ; 
w=l  ; 

for  s=  [4: .1:5] 

clearvars  -EXCEPT  s  ADPower  KSPower  numsamp  variance  w  ratio 
for  ssize=4 : 2 : 150 
for  k=l : 10000 


xl=normrnd (s, variance, ssize, 1) ; 
xl=sort (xl ) ; 

x2=normrnd (5, variance,  ssize,  1)  ; 
x2=sort (x2 ) ; 
xtot=sort ( [xl;x2] ) ; 
xtotu=unique (sort ( [xl ;  x2 ]  )  )  ; 
l=length (unique (sort ( [xl ;  x2 ] )  )  )  ; 

n=length (xl ) ; 
m=length (x2 ) ; 
tot=n+m; 
smallh=0 ; 
count f =0 ; 
countg=0 ; 

critical=l . 07  *  sqrt ( (n+m) / (n*m) ) ; 


for  1=1 :( length (xtotu) -1 ) 
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smallh=0 ; 
count f =0 ; 
countg=0 ; 
bighcount=0 ; 
bigf count=0 ; 
biggcount=0 ; 

9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  o-l-ov'-l-  7\  n  f  q  of 

oooooooooooooooooooooooo  o  Lai  L  t iU  LtloL 

for  j=l : length (xtot ) 

if  xtotu ( i ) ==xtot ( j ) 
smallh=smallh+l ; 
end 

end 

for  j=l : length (xtot ) 
if  xtotu (i) <xtot ( j  ) 
bighcount=bighcount  +  1; 
end 

end 

bigh=bighcount  +  . 5*smallh; 


for  j  =  l : length  (xl ) 

if  xtotu ( i ) ==xl ( j ) 
count f=count f +1 ; 
end 

end 

for  j=l : length (xl ) 
if  xtotu ( i ) <xl ( j  ) 
bigf count=bigf count  +  1; 
end 

end 
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bigf=  bigfcount  +  . 5*countf; 

for  j  =  l : length  (x2 ) 

if  xtotu ( i ) ==x2  (  j ) 
countg=countg+l ; 
end 

end 

for  j=l : length (x2 ) 
if  xtotu ( i ) <x2 ( j  ) 

biggcount=biggcount  +  1; 
end 

end 

bigg=  biggcount  +  .  5*countg; 

ff (i,l)=smallh  *  ((  (tot)*bigf  -length (xl ) *bigh) A2 

-  bigh)  -  . 25*smallh*tot ) ; 

gg ( i , 1 ) =smallh  *  ((  (tot)*bigg  -length (x2 ) *bigh) A2 

-  bigh)  -  . 25*smallh*tot ) ; 

end 

A2=  (tot-1 )/ (tot A2 )  *  [( 1/length (xl ) )  *  sum(ff)  + 

*  sum (gg) ] ; 

g  =  0; 

for  r=l :  (tot-2 ) 
for  v=(r  +  1) : (tot  -  1) 
g  =  g  +  (1  /  (  (tot  -  r)  *  v)  )  ; 

end 
end 
T  =  0; 


)  /  (bigh* (tot 

)  /  (bigh* (tot 

(1/length (x2 ) ) 
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for  d  =  1 :  (tot  -  1 ) 
T  =  T  +  (1  /  d)  ; 

end 


S  =  ( 1/n)  +  (1/m) ; 


a 

= 

(4 

* 

g 

- 

6) 

*  (numsamp  -  1) 

+ 

(10 

- 

6  *  g)  *  S; 

b 

= 

(2 

* 

g 

- 

4) 

*  numsamp  A  2  + 

8 

'J 

* 

numsamp  +  (2 

- 

4) 

* 

s 

- 

8 

* 

T  +  4  *  g  -  6 ; 

c 

= 

(6 

* 

T 

+ 

2 

*  g  -  2 )  *  numsamp 

A  2 

+ 

(4  *  T  -  4  * 

numsamp 

+ 

(2 

* 

T  -  6)  *  S  +  4  * 

T; 

d 

= 

(2 

* 

T 

+ 

6) 

*  numsamp  A  2  - 

4 

*  T 

* 

numsamp; 

sigma  =  ( (a  *  tot  A  3  +  b  *  tot  A  2  +  c  *  tot  +  d)  /  (  (tot  -  1)  * 

(tot  -2)  *  (tot  -3)  *  (numsamp  -1)  A2))  A  0.5; 

critval20  =  1  +  sigma  *  (0.877  -  0.08  /  ((numsamp  -  1)  A  0.5)  - 

0.171666  /  (numsamp  -  1) ) ; 

%%%%%%%%%%%%%%%%T  values  from  table 
T25  =  0.326; 

T20  =  0.625666; 

T10  =  1.225; 

T05  =  1.96; 

T025  =  2.719; 

TO 1  =  3.752; 

T  =  (A2  -  1)  /  sigma; 


9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 1 
ooooooooooooooooooooo  olUy 

P25  =  -1.386; 
p2  0  =  -1.609; 
plO  =  -2.303; 
p05  =  -2.996; 
p025  =  -3.689; 
pOl  =  -4.605; 

if  T  <  T20  &  T  >  T25 

P  =  (  (  (T  -  T25)  *  (p2 0  - 

elseif  T  <  T10  &  T  >  T20 

P  =  ( ( (T  -  T20)  *  (plO  - 

elseif  T  <  T05  &  T  >  T10 


transformed  P  values 


P25) )  /  (T20  -  T25) ) 


p2 0 )  )  /  (T10  -  T20)  ) 


+  P25; 


+  p2  0; 
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P  =  ( ( (T  -  T10 )  *  (p05  -  plO))  /  (T05  - 

elseif  T  <  T025  &  T  >  T05 
P  =  ( ( (T  -  T05 )  *  (p025  -  p05) )  /  (T025 

elseif  T  <  T01  &  T  >  T025 
P  =  (  (  (T  -  T025)  *  (pOl  -  p025) )  /  (T01 

elseif  T  <  T25 

P  =  (  (P25  -  p20)  /  (T25  -  T20) )  *  (T  - 

elseif  T  >  T01 

P  =  ( (pOl  -  p025)  /  (T01  -  T025) )  *  (T 

else 
P  =  9; 

end 

A2  ; 

P=exp (P) ; 

Pall (k, 1) =P; 

counter=0 ; 
for  v=l:k 

if  Pall (v, 1 ) < . 20 
counter=counter+l ; 
end 
end 

ADPower (ssize-3,w,  1) =counter/k; 


9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

for  ii=l : length (xtot ) 


start  KS  test 


for  r=l:n 
if  xtot ( ii ) ==xl ( r ) 
cdfl ( ii ) = ( 1/n) ; 
break 
else 

cdfl ( ii ) =0 ; 


T10) )  +  plO ; 


-  T05) )  +  p05 ; 


-  T025)  )  +  p 0 2 5 ; 


T25)  +  P25; 


-  TO 1 )  +  pOl ; 
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end 

end 

for  r=l:m 
if  xtot ( ii ) ==x2 ( r ) 
cdf 2 (ii) = (1/m) ; 
break 
else 

cdf 2 (ii) =0; 
end 
end 

end 

for  i=2 : length (cdf 1 ) 
cdf 1 (i) =cdf 1 ( i ) +cdf 1 (i-1)  ; 
end 

for  i=2 : length ( cdf 2  ) 
cdf 2 ( i ) =cdf 2 ( i ) +cdf 2 ( i-1 )  ; 
end 

ks=max (abs ( [ cdf 1- cdf 2 ] ) ) ; 

mmax (k, 1 ) =ks ; 

end 

counter=0 ; 
for  v=l:k 

if  mmax ( v, 1 ) >crit ical 
count er =count er+1  ; 
end 
end 

KSPower (ssize-3,w) =counter/k; 
ratio (w,  1 ) =abs ( s-4 ) / sqrt (variance) ; 


end 


w=w+l ; 
s 
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end 


ADPower (2:2: end, :)=[]; 

KSPower (2:2: end, :)=[]; 

index=4 :  2  : 150 ; 

contourf ( f lipud (ratio) , index,  ADPower) 
contourf ( f lipud (ratio) , index, KSPower) 
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This  page  intentionally  left  blank. 
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Appendix  D:  VBA  Code  for  AD  and 
K5  Tests 


Sub  adtest ( ) 

r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  t 

'IMPORTANT  NOTE:  INSERT  SAMPLE  DATA  IN  COLUMNS  STARTING  AT  A3  AND  B3 

r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r  r 


Dim  LastRowl  As  Double 
Dim  LastRow2  As  Double 
Dim  lastrowtot  As  Double 
Dim  rl  As  Range 
Dim  r2  As  Range 
Dim  xl ( ) 

Dim  x2 ( ) 

Dim  xtot ( ) 

Dim  xllength  As  Double 
Dim  x21ength  As  Double 
Dim  xtotlength  As  Double 
Dim  smallh  As  Double 
Dim  countf  As  Double 
Dim  countg  As  Double 
Dim  bighcount  As  Double 
Dim  bigfcount  As  Double 
Dim  biggcount  As  Double 
Dim  f f ( ) 

Dim  gg() 

Dim  g  As  Double 

Dim  S  As  Double 

Dim  T  As  Double 

Dim  a  As  Double 

Dim  b  As  Double 

Dim  c  As  Double 

Dim  d  As  Double 

Dim  sigma  As  Double 

Dim  significance  As  Double 

Dim  P  As  Double 

Dim  cdf 1 ( ) 

Dim  cdf 2 ( ) 

Dim  dcdf ( ) 

Active Sheet . Range (nH3:H9999M)  . Clear Contents 
ActiveSheet .Range ( "13 : 19999" ) . ClearContent s 
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numsamp  =  2 
With  ActiveSheet 

LastRowl  =  . Cells (. Rows . Count ,  "A")  .End(xlUp)  .Row 

xl  —  Range("A3:A"  &  LastRowl ). Value 

End  With 

With  ActiveSheet 

LastRow2  =  . Cells (. Rows . Count ,  "B")  .End(xlUp)  .Row 

x2  =  Range ( "B3 : B"  &  LastRow2 ). Value 

End  With 

xllength  =  Application . CountA (xl ) 
x21ength  =  Application . CountA (x2 ) 
xtotlength  =  xllength  +  x21ength 

xl  =  Application . Transpose (xl ) 
x2  =  Application . Transpose  (x2 ) 
xtot  -  xl 

i  =  1 

While  i  <=  x21ength 

ReDim  Preserve  xtot (1  To  xllength  +  i) 
xtot  (i  +  xllength)  =  x2 (i) 
i  =  i  +  1 

Wend 

xtot  =  Application . Transpose (xtot ) 

Sheets ("AD  test") . Range ( "H3 : H"  &  xtotlength  +  2) .Value  -  xtot 

Range("H3:H"  &  xtotlength  +  2). Sort  keyl : =Range ( "H3 " )  , 

orderl : =xlAscending,  Header : =xlNo 

xtot  =  Range ("H3:H"  &  xtotlength  +  2) .Value 

xtot  =  Application . Transpose (xtot ) 

ActiveSheet . Range ( "H2 : H"  &  (2  +  xtotlength) ) . AdvancedFilter 

Action : =xlFilterCopy ,  CopyToRange : =ActiveSheet . Range ( " 12 " )  , 

Unique : =True 

With  ActiveSheet 

lastrowtot  =  . Cells (. Rows . Count ,  " I "  )  . End (xlUp)  . Row 

xtotu  =  Range("T3:I"  &  lastrowtot) .Value 


End  With 

xtotulength  =  Application . CountA (xtotu) 
xtotu  =  Application . Transpose (xtotu) 
’MsgBox  xtotlength 


fVVffVVffVVffVVVVVVffVVVfVVffVVffVVVfVVVVVVVffVVVVVVffVVIVVVVfVVffVV 

i  =  1 
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While  i  <=  (xtotulength  -  1) 


ReDim  Preserve  ff(l  To  i) 
ReDim  Preserve  gg ( 1  To  i) 

smallh  =  0 
countf  =  0 
countg  -  0 
bighcount  =  0 
bigfcount  =  0 
biggcount  =  0 


For  j  =  1  To  xtotlength 
If  xtotu(i)  =  xtot(j)  Then 
smallh  -  smallh  +  1 
End  If 
Next 


For  o  =  1  To  xtotlength 

If  xtotu(i)  <  xtot (o)  Then 
bighcount  -  bighcount  +  1 

End  If 

Next 

bigh  =  bighcount  +0.5  *  smallh 


For  r  =  1  To  xllength 

If  xtotu(i)  =  xl  (r)  Then 
countf  =  countf  +  1 
End  If 

Next 

For  v  =  1  To  xllength 

If  xtotu(i)  <  xl (v)  Then 
bigfcount  =  bigfcount  +  1 
End  If 

Next 
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bigf  =  bigf count  +0.5  *  countf 

For  w  =  1  To  x21ength 

If  xtotu(i)  =  x2 (w)  Then 
countg  -  countg  +  1 
End  If 

Next 

For  q  =  1  To  x21ength 

If  xtotu(i)  <  x2 (q)  Then 
biggcount  =  biggcount  +  1 

End  If 

Next 

bigg  =  biggcount  +0.5  *  countg 


ff (i)  =  smallh  *  (  (xtotlength  *  bigf  -  xllength  * 

*  (xtotlength  -  bigh)  -  0.25  *  smallh  *  tot)) 
gg(i)  =  smallh  *  ((xtotlength  *  bigg  -  x21ength  * 

*  (xtotlength  -  bigh)  -  0.25  *  smallh  *  tot)) 

1  =  1  +  1 
Wend 

A2  =  ((xtotlength  -  1)  /  (xtotlength  A  2) )  *  ( 

Application . WorksheetFunct ion . Sum ( ff )  +  (1 

Application . WorksheetFunct ion . Sum (gg) ) 

' ’ ' ' ' 1 ' ’ f ’ ’ ’ ' ' ' ’ f ' ' ' ' ' ' ' ' ’ ’ ' ' ’ ' '  f ' ' '  critical  value 
g  =  0 

For  1  =  1  To  (xtotlength  -  2) 

For  j  =  (i  +  1)  To  (xtotlength  -  1) 
g  =  g  +  (1  /  (  (xtotlength  -  i)  *  j)  ) 

Next 
Next 
T  =  0 

For  d  =  1  To  (xtotlength  -  1) 

T  =  T  +  (1  /  d) 


bigh)  A  2  /  (bigh 
bigh)  A  2  /  (bigh 


(1  /  x21ength)  * 

x21ength)  * 
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Next 
S  =  0 

S  =  1  /  xllength  +  1  /  x21ength 

a  =  (4  *  g  -  6)  *  (numsamp  -1)  +  (10  -  6  *  g)  *S 

b  =  (2  *  g  -  4)  *  numsamp  A  2  +  8  *  T  *  numsamp  +  (2*g-14*T- 
4)  *S-8*T  +  4*g-6 

c  =  (6*T+2*g-2)  *  numsamp  A  2  +  (4*p-4*g+6)  *  numsamp 

+  (2  *  T  -  6)  *  S  +  4  *  T 

d  =  (2*T+6)  *  numsamp  A  2  -  4  *  T  *  numsamp 


sigma  =  ( (a  *  xtotlength  A  3  +  b  *  xtotlength  A  2  +  c  *  xtotlength  + 
d)  /  ((xtotlength  -  1)  *  (xtotlength  -  2)  *  (xtotlength  -  3)  * 
(numsamp  -  1)  A  2)  )  A  0.5 


critval25  =  1  +  sigma  * 

0.105  /  (numsamp  -  1)) 

(0 . 675 

— 

0.245 

/ 

( (numsamp 

-  i)  a 

0.5)  - 

critval20  =  1  +  sigma  * 

0.171666  /  (numsamp  -  1)) 

(0.877 

" 

0.08 

/ 

( (numsamp 

-  i)  a 

0.5)  - 

critvallO  =  1  +  sigma  *  (1 

/  (numsamp  -  1)) 

.281  + 

0. 

25  /  ( 

(numsamp  -  1) 

> 

o 

Cn 

-  0.305 

critval05  =  1  +  sigma  * 

0.362  /  (numsamp  -  1)) 

(1 . 645 

+ 

0.678 

/ 

(  (numsamp 

-  i)  A 

0.5)  - 

critval025  =  1  +  sigma  * 
0.391  /  (numsamp  -  1)) 

(1.96 

+ 

1 . 149 

/ 

(  (numsamp 

-  i)  A 

0.5)  - 

critvalOl  =  1  +  sigma  * 

0.396  /  (numsamp  -  1)) 

(2.326 

+ 

1.822 

/ 

(  (numsamp 

-  i)  A 

0.5)  - 

value  computation 

’  1  ’  ’  ’  1  1  ’  ’  1  ’  ’  ’  1  ’  ’  ’ T  values  from  table 

T25  =  0.326 

T20  =  0.625666 

T10  =  1.225 

T05  =  1.96 

T025  =  2.719 

TO 1  =  3.752 

T  -  (A2  -  1)  /  sigma 

' ’ ’log  transformed  P  values 

P25  =  -1.386 

p2  0  =  -1.609 

plO  =  -2.303 

p05  =  -2.996 

p025  =  -3.689 

pOl  =  -4.605 
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If  T  <  T20  And  T  >  T25  Then 

P  =  (((T  -  T25)  *  (p2 0  -  P25) )  /  (T20  -  T25) )  +  P25 

Elself  T  <  T10  And  T  >  T20  Then 
P  =  (  (  (T  -  T20)  *  (plO  -  p2 0 ) )  /  (T10  -  T20) )  +  p20 

Elself  T  <  T05  And  T  >  T10  Then 
P  =  (  (  (T  -  T10)  *  (pO  5  -  plO))  /  (T05  -  T10) )  +  plO 

Elself  T  <  T025  And  T  >  T05  Then 
P  =  ( ( (T  -  T05)  *  (p025  -  p05 ) )  /  (T025  -  T05) )  +  p05 
Elself  T  <  T01  And  T  >  T025  Then 
P  =  (  (  (T  -  T025)  *  (pOl  -  p025) )  /  (T01  -  T025))  +  p025 
Elself  T  <  T25  Then 

P  =  (  (P25  -  p20 )  /  (T25  -  T20 ) )  *  (T  -  T25)  +  P25 
Elself  T  >  T01  Then 

P  =  ( (pOl  -  p025)  /  (T01  -  T025 ) )  *  (T  -  T01)  +  pOl 

Else:  P  =  9 


End  If 
P  =  Exp (P ) 


MsgBox 

"Anderson  Darling 

Test 

Results"  & 

vbNewLine  & 

vbNewLine  & 

"Test 

statistic  value:  " 

&  A2 

& 

vbNewLine 

Sc 

vbNewLine 

Sc 

"Critical 

value 

(0.25 

significance) 

.  ?! 

Sc 

critval25 

Sc 

vbNewLine 

Sc 

"Critical 

value 

(0.20 

significance) 

.  !! 

Sc 

critval20 

Sc 

vbNewLine 

Sc 

"Critical 

value 

(0.10 

significance) 

.  !! 

Sc 

critvallO 

Sc 

vbNewLine 

Sc 

"Critical 

value 

(0.05 

significance) 

.  H 

Sc 

critval05 

Sc 

vbNewLine 

Sc 

"Critical 

value 

(0.025 

significance) 

.  !l 

Sc 

critval02  5 

Sc 

vbNewLine 

Sc 

"Critical 

value  (0.01  significance) :  "  &  critvalOl  &  vbNewLine  &  vbNewLine  &  "P 

value  =  "  &  Format (P,  "0.0000000000000") 


!  I  I  !  I  f  ?!  I  f  ?!  I  I  ?!  I  I  !  f  I  I  !  I  I  I  ?!  f  I  ?!  I  I 


If  A2  <  critval25  Then 
h25  =  "accept" 
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Else 

h25  =  "reject" 

End  If 

If  A2  <  critval20  Then 
h20  =  "accept" 

Else 

h20  =  "reject" 

End  If 

If  A2  <  critvallO  Then 
hlO  =  "accept" 

Else 

hlO  =  "reject" 

End  If 

If  A2  <  critval05  Then 
h05  =  "accept" 

Else 

h05  =  "reject" 

End  If 

If  A2  <  critval025  Then 

h025  =  "accept" 

Else 

h025  =  "reject" 

End  If 

If  A2  <  critvalOl  Then 

hOl  =  "accept" 

Else 

hOl  =  "reject" 

End  If 


MsgBox  "Anderson  Darling  Test  Results"  &  vbNewLine  &  vbNewLine  &  "At 
0.25  signif  icance,  "  &  h25  &  "  the  null  hypothesis."  &  vbNewLine  & 

"At  0.20  signif icance,  "  &  h20  &  "  the  null  hypothesis."  &  vbNewLine 
&  "At  0.10  significance,  "  &  hlO  &  "  the  null  hypothesis."  & 

vbNewLine  &  "At  0.05  significance,  "  &  h05  &  "  the  null  hypothesis." 
&  vbNewLine  &  "At  0.025  significance,  "  &  h025  &  "  the  null 

hypothesis."  &  vbNewLine  &  "At  0.01  significance,  "  &  hOl  &  "  the 

null  hypothesis." 


f  f  1  1  ’  f  1  1  ’  f  1  1  ’  f  start  KS  test 
For  ii  =  1  To  xtotlength 
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For  r  =  1  To  xllength 
If  xtot(ii)  =  xl (r)  Then 
ReDim  Preserve  cdfl (1  To  ii) 
cdfl (ii)  =  (1  /  xllength) 
Exit  For 
Else 

ReDim  Preserve  cdfl (1  To  ii) 
cdfl (ii)  =  0 
End  If 
Next  r 

For  r  =  1  To  x21ength 
If  xtot(ii)  =  x2 (r)  Then 
ReDim  Preserve  cdf2 (1  To  ii) 
cdf2 (ii)  =  (1  /  xllength) 
Exit  For 
Else 

ReDim  Preserve  cdf2 (1  To  ii) 
cdf 2 (ii)  =  0 
End  If 
Next  r 

Next  ii 


For  i  =  2  To  Application . CountA ( cdf 1 ) 
cdfl (i)  =  cdfl ( i)  +  cdfl ( i  -  1) 

Next  i 


For  i  =  2  To  Application . CountA ( cdf 2 ) 
cdf2 (i)  =  cdf2 (i)  +  cdf2 (i  -  1) 

Next  i 

For  i  =  1  To  Application . CountA ( cdf 1 ) 

ReDim  Preserve  dcdf (1  To  i) 
dcdf(i)  =  Abs (cdfl ( i)  -  cdf2 (i) ) 

Next  i 


ks  =  Application .Max (dcdf ) 


critical20  = 

0.5 

criticallO  = 

0.5 

critical05  = 

1.07  * 

( (xllength 

+ 

x21ength) 

/ 

(xllength 

*  x21ength) 

1.22  * 

( (xllength 

+ 

x21ength) 

/ 

(xllength 

*  x21ength) 

1.36  * 

( (xllength 

+ 

x21ength) 

/ 

(xllength 

*  x21ength) 

0.5 


A 


A 
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criticalOl  =  1.63  *  (  (xllength  +  x21ength)  /  (xllength  *  x21ength)  ) 

0.5 

If  ks  <  critical20  Then 
h20  =  "accept" 

Else 

h20  =  "reject" 

End  If 

If  ks  <  criticallO  Then 
hlO  =  "accept" 

Else 

hlO  =  "reject" 

End  If 

If  ks  <  critical05  Then 
h05  =  "accept" 

Else 

h05  =  "reject" 

End  If 

If  ks  <  criticalOl  Then 
hOl  =  "accept" 

Else 

hOl  =  "reject" 

End  If 


MsgBox  "KS  Test  Results"  &  vbNewLine  &  vbNewLine  &  "Test  statistic 
value:  "  &  ks  &  vbNewLine  &  vbNewLine  &  "At  0.20  signif  icance,  "  & 
h20  &  "  the  null  hypothesis."  &  vbNewLine  &  "At  0.10  signif icance,  " 
&  hlO  &  "  the  null  hypothesis."  &  vbNewLine  &  "At  0.05  significance, 
"  &  h05  &  "  the  null  hypothesis."  &  vbNewLine  &  "At  0.01 
significance,  "  &  hOl  &  "  the  null  hypothesis." 


End  Sub 
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